Method and system for reducing a voice signal noise

ABSTRACT

A method is provided whereby, before being subjected to a low rate voice coding, an incoming digital voice signal is chronologically segmented into blocks, the blocks are broken down respectively, in chronological order, into frequency components by a transformation in the frequency range and the frequency components are multiplied by weight factors depending on the frequency and modifiable in time, a frequency component being multiplied by the last weight factor calculated for the frequency component if the factor is less than the current weight factor.

BACKGROUND OF THE INVENTION

The present invention relates to a method and a system for voiceprocessing; in particular, for processing noise in a voice signal.

The incredible pace of technical development in the area of mobilecommunication has led to constantly increasing demands on voiceprocessing in recent years; particularly voice encoding and noisesuppression. This is attributable in no small measure to the restrictedavailability of bandwidth and constantly increasing demands on voicequality.

A major component of voice processing includes estimating the noisesignal or interference by which, for example, a voice signal captured bya microphone is normally affected and, if necessary, suppressing it inthe input signal so as to only transmit the voice signal where possible.However, with conventional methods of noise suppression, undesiredartifacts, also referred to as musical tones, are frequently produced inthe background signal.

An object of the present invention, therefore, is to provide a technicaltemplate which allows high quality voice transmission at a low datarate.

SUMMARY OF THE INVENTION

The present invention is, thus, directed toward multiplying thefrequency components of a voice signal affected by a noise signal beforeencoding with a low-rate voice codec by frequency-dependent weightingfactors which change over time, where a frequency component ismultiplied by a current weighting factor if the current weighting factoris smaller than the weighting factor last calculated for the respectivefrequency component, and where a frequency component is multiplied bythe weighting factor last calculated for such frequency component if theweighting factor last calculated is smaller than the current weightingfactor. A low-rate voice codec here refers to, in particular, a voicecodec which delivers a data rate which is less than 5 Kbits per second.

The above has the effect of attenuating a noise signal applied to avoice signal in such a way as to enable good-quality voice transmissionwith minimum use of computing and memory resources.

The present invention initially stems from the knowledge that whenlow-rate voice codecs are used, good voice quality only can be obtainedif the artifacts, as already explained-above, are avoided or reduced asmuch as possible. This could be detected by using expensive simulationtools created separately for such purpose.

The present invention further stems from the knowledge that, asexpensive simulations also-show, by specific use of current or recentlycalculated weighting factors, artifacts in the background signal,particularly during voice pauses, are reduced.

This advantageous effect of the present invention, that is thecombination of a specific method for noise suppression with a low-ratevoice codec, which delivers a data rate that lies between 3 Kbits persecond a 5 Kbits per second, has been confirmed by comprehensivesimulations.

Additional features and advantages of the present invention aredescribed in, and will be apparent from, the following DetailedDescription of the Invention and the Figures.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows a simplified block diagram of a method for voiceprocessing.

FIG. 2 shows a flowchart of a method for noise suppression.

FIG. 3 shows a simplified block diagram of a system for voiceprocessing.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 shows a block diagram of a method for voice processing. Thismethod can be roughly divided into the interoperating blocks noisesuppression and downstream low-rate voice codec NSC. A low-rate voicecodec, delivering a data rate of 4 Kbits per second, for example, isknown per se, and thus will not be described in any greater detail atthis point.

The method for noise suppression can be subdivided into a number offunctional blocks, which are explained below.

The blocks Analysis AN and Synthesis SY form the frame of the method fornoise suppression. A segmentation of the input signal undertaken priorto an analysis AN (not shown in FIG. 1) as well as the block sizes usedare tailored to the low-rate voice codec in such a way that thealgorithmic delay of the signal caused by the noise suppression remainsas small as possible. The input signal x(k) is segmented, for example,into blocks of 20 ms at a sample rate of 8 kHz. The processed data alsocan be passed on to the voice codec in segments with the specified blocklength.

The analysis AN in this case may include a windowing, zero-padding and atransformation in the frequency range through a Fourier transformation,and the synthesis SY may include a back transformation by an inverseFourier transformation in the time range and a signal reconstruction inaccordance with the Overlap Add Method.

The frequency components obtained from the analysis AN feature a realand an imaginary part or, respectively, a magnitude and a phase. To saveeffort, the magnitudes of different adjacent frequency components arefirst combined into frequency groups on the basis of a Bark table FGZU1.

For each frequency group, a gain calculation VB is executed on the basisof an A-priori and an A-posteriori signal-to-noise ratio which resultsin weighting factors for the magnitudes of the individual frequencygroups. The A-priori signal-to-noise ratio can be derived from the powerdensity spectrum of the disturbed input signal and the A-priori noiseestimation GS. The A-posteriori signal-to-noise ratio can be calculatedfrom the power density spectrum of the disturbed input signal and theoutput signal of a buffering P which, in turn, is directed to acorrected frequency component combined by a frequency group combinationFGZU2.

Before a decomposition FGZE of the frequency components previouslycombined into frequency groups and the multiplication of the frequencycomponents by the weighting factor calculated for a correspondingfrequency group in each case for noise suppression, the weightingfactors are subjected to what is known as a minimum filter MF which willbe explained in more detail later on the basis of FIG. 2.

Thus, for noise estimation the power density of the background noise isbasically estimated from the input signal. To reduce the computing powerneeded as well as memory used, the A-priori noise estimation, the gaincalculation, the buffering of the signal magnitude modified for noisesignal suppression and the minimum filter are only executed in a fewsubbands. For this, the magnitude of the input signal transformed in thefrequency range and of the signal modified for noise suppression arecombined with two blocks for frequency group combination into subbands.The width of the subbands is oriented in this case to the Bark scale andthus varies with the frequency. The output signal of each frequencygroup of the minimum filter is distributed by the block frequency groupdecomposition to the corresponding frequency components or Fouriercoefficients. To calculate the input signal of the buffering block, inanother embodiment the combined magnitude of the input signal can bemultiplied element-by-element with the output signal of the minimumfilter instead of a frequency group combination of the signal modifiedfor noise signal suppression.

In addition to noise estimation, there is an A-posteriori estimation ofthe voice signal proportion. For this, the signal combined intofrequency groups of the modified magnitude values for noise reduction isstored in the buffering block. The output signals of the A-priori noiseestimation and the buffering are used in addition to the magnitude valueof the input signal combined into frequency groups for calculation ofthe gain. Weighting factors result from the gain calculation and are fedto a minimum-filter, which is explained in more detail below. Theminimum filter finally determines the weighting factors provided formultiplication with the frequency components of the frequency groups.

Using the flowchart as shown in FIG. 2, a simplified embodiment variantfor noise suppression of a voice signal will now be explained in moredetail. In this case, the frequency group combination blocks FGZU1,FGZU2 shown in FIG. 1 and frequency group decomposition are not used.

Disturbed voice signals picked up by a microphone are converted by asampling unit and an analog/digital converter connected downstream fromit into an incoming digital voice signal s(k) affected by disturbancesn(k). This input signal is segmented chronologically into blocks (block,m) (101) and the blocks (block, m) are mapped in chronological order bya transformation into the frequency range to i frequency componentsf(i,m) in each case (102), with m representing the time and i thefrequency. This can be done by a Fourier transformation, for example. Ifthe Fourier coefficients of the input signal are identified by X(i,m),the values |X(i,m)|^2 can be identified as frequency components.

The frequency components of a voice signal f(i,m) are multiplied inaccordance with the segmentation 101 explained above and transformationinto the frequency range 102 by a weighting factor H(i,m), with theweighting factor, for example, being able to be derived from theestimated A-priori and A-posteriori signal-to-noise ratios alreadyexplained above. The A-priori signal-to-noise ratio can be derived fromthe power density spectrum of the disturbed input signal and theA-priori noise estimation. The A-posteriori signal-to-noise ratio can becalculated from the power density spectrum of the disturbed input signaland the output signal of the buffering.

The frequency or frequency component-dependent weighting factor is, inthis case, modifiable over time and is determined so that it iscontinuously updated to correspond to the chronologically modifiablefrequency components. To avoid undesired artifacts in the backgroundsignal, however, for implementation of a minimum filter formultiplication by a frequency component f(i,m), the weighting factorH(i,m) currently calculated for such frequency component is not alwaysincluded but only when the weighting factor last calculated for thisfrequency component, that is in the previous step H(i,m−1), is smallerthan the current weighting factor last calculated, that is in theprevious step for this frequency component H(i,m−1).

One embodiment of the present invention provides for a frequencycomponent to be multiplied by the current weighting factor when thefrequency-dependent weighting factor lies above a threshold value, evenif the last weighting factor calculated for this frequency component issmaller than the current weighting factor.

Such embodiment may be implemented by a filter which compares thecurrent weighting factor with the chronologically previous weightingfactor for the same frequency in each case and selects the smaller ofthe two values for application to the frequency component. If the fixedthreshold value of 0.76 is exceeded by the current weighting factor,there is no modification of the frequency component.

FIG. 3 shows a programmable processor unit PE such as a microcontroller,for example, which also can may include a processor CPU and a memoryunit SPE.

Depending on the embodiment, further components may be arranged withinor outside the processor unit PE, which are assigned to the processorunit, belong to the processor unit, controlled by the processor unit orcontrolling the processor unit, of which the function in conjunctionwith the processor unit is sufficiently known to an expert in this fieldand thus will not be described in any greater detail at this point. Thevarious components may exchange data with the processor unit PE via abus system BUS or input/output interfaces IOS and, where necessary,suitable controllers (not shown). In such cases, the processor unit PEmay be an element of an electronic device such as an electroniccommunication terminal or a mobile telephone, and may control otherspecific methods and applications for the electronic device.

Depending on the embodiment, the memory unit SPE, which also may includeone or more volatile RAM or ROM memory modules, or parts of the memoryunit SPE can be implemented as part of the processor unit (shown in FIG.4) or implemented as an external memory unit (not shown in FIG. 4),which is localized outside the processor unit PE or even outside thedevice containing the processor unit PE and is connected to theprocessor unit PE by lines or a bus system.

The program data which is included for controlling the device and methodof voice processing and for noise signal suppression is stored in thememory unit SPE. Implementing the above-mentioned functional componentsby programmable processors or by microcircuits provided separately forthis purpose is within the knowledge of experts in this field.

The digital voice signals affected by disturbance may be fed to theprocessor unit PE via the input/output interface IOS. In addition to theprocessor CPU, a digital signal processor DSP may be provided to executeall or some of the steps of the method explained above.

Although the present invention has been described with reference tospecific embodiments, those of skill in the art will recognize thatchanges may be made thereto without departing from the spirit and scopeof the present invention as set forth in the hereafter appended claims.

1. A method for voice processing, comprising: segmenting an incoming digital voice signal chronologically into blocks; mapping the blocks in chronological order, by a transformation in a respective frequency range, onto respective frequency components; multiplying the frequency components by chronologically modifiable frequency-dependent weighting factors derived from estimated a-priori and a-posteriori signal-to-noise ratios having a plurality of values, wherein: a respective frequency component is multiplied by a current weighting factor if the current weighting factor is smaller than a weighting factor last calculated for the frequency component, and the frequency component is multiplied by the weighting factor last calculated for the frequency component if the weighting factor last calculated is smaller than the current weighting factor, and feeding the weighted frequency components back, after a back transformation in a respective time range, to a low-rate voice codec; and wherein the a-priori signal-to-noise ratio is defined as a power density spectrum of the incoming digital voice signal and an a-priori noise estimation, and the a-posteriori signal-to-noise ration is defined as the power density spectrum of the incoming digital voice signal and an output signal of a buffering.
 2. A method for voice processing as claimed in claim 1, wherein a respective frequency component is multiplied by the current weighting factor if the respective frequency-dependent weighting factor lies above a threshold value.
 3. A method for voice processing as claimed in claim 1, wherein a respective frequency component is multiplied by the current weighting factor if the current weighting factor lies above a threshold value, and if the weighting factor last calculated for the frequency component is smaller than the current weighting factor.
 4. A system for noise suppression, comprising: an input for digital voice signals; and a processor unit for chronologically segmenting an incoming digital voice signal into blocks, for mapping the blocks in chronological order, by a transformation in a respective frequency range, onto respective frequency components, for multiplying the frequency components by chronologically modifiable frequency-dependent weighting factors derived from estimated a-priori and a-posteriori signal-to-noise ratios having a plurality of values, wherein a respective frequency component is multiplied by a current weighting factor if the current weighting factor is smaller than a weighting factor last calculated for the frequency components, the frequency component is multiplied by the weighting factor last calculated for the frequency component if the weighting factor last calculated is smaller than the current weighting factor, and for feeding the weighted frequency components back, after a back transformation in a respective time range, to a low-rate voice codec, the a-priori signal-to-noise ratio is defined as a power density spectrum of the incoming digital voice signal and an a-priori noise estimation, and the a-posteriori signal-to-noise ration is defined as the power density spectrum of the incoming digital voice signal and an output signal of a buffering.
 5. A system for noise suppression as claimed in claim 4, wherein a respective frequency component is multiplied by the current weighting factor if the respective frequency-dependent weighting factor lies above a threshold value.
 6. A system for noise suppression as claimed in claim 4, wherein a respective frequency component is multiplied by the current weighting factor if the current weighting factor lies above a threshold value, and if the weighting factor last calculated for the frequency component is smaller than the current weighting factor. 